Use cuda-pathfinder in build-system.requires#1817
Conversation
Pin cuda-pathfinder>=1.5 in both build-system.requires and project.dependencies. Made-with: Cursor
The previous 1.3.4a0 was a stale value that could confuse someone. Made-with: Cursor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
| COMPILE_FOR_COVERAGE = bool(int(os.environ.get("CUDA_PYTHON_COVERAGE", "0"))) | ||
|
|
||
|
|
||
| # Please keep in sync with the copy in cuda_bindings/build_hooks.py. |
There was a problem hiding this comment.
We should avoid duplicating implementations. Could we not hoist this into a separate file and import from both?
There was a problem hiding this comment.
We should avoid duplicating implementations.
Agreed
Could we not hoist this into a separate file and import from both?
No 😭
In the words of Cursor Opus 4.6 1M Thinking:
There's no reasonable way to share code between the two build_hooks.py files:
- Each is a PEP 517 build backend loaded via
backend-path=["."]from its own package directory (cuda_bindings/orcuda_core/). They can't import from the monorepo root or from each other. - The function exists because
cuda.pathfindercan't be imported normally in this context — so it can't live insidecuda-pathfindereither. - A shared helper package in
build-system.requireswould be massive overkill for ~15 lines of stable code.
The only alternatives are worse: a new PyPI package just for this, importlib file-path hacks, or copying a shared file into both directories at CI time.
The "please keep in sync" comments are the pragmatic solution here.
There was a problem hiding this comment.
We have other existing cases of "please keep in sync" that are impractical to avoid.
It just crossed my mind: a pre-commit check to enforce that they stay in sync is probably pretty easy to implement.
| if os.path.isdir(os.path.join(sp_cuda, "pathfinder")): | ||
| cuda.__path__ = list(cuda.__path__) + [sp_cuda] | ||
| break | ||
| import cuda.pathfinder |
There was a problem hiding this comment.
Should we include some message explaining that we iterated through sys.path looking for pathfinder and couldn't find it? That seems to be important information that would be missing from just throwing we couldn't import pathfinder module.
There was a problem hiding this comment.
for p in sys.path:
sp_cuda = os.path.join(p, "cuda")
if os.path.isdir(os.path.join(sp_cuda, "pathfinder")):
cuda.__path__ = list(cuda.__path__) + [sp_cuda]
break
else:
raise ModuleNotFoundError(
"cuda-pathfinder is not installed in the build environment. "
"Ensure 'cuda-pathfinder>=1.5' is in build-system.requires."
)
import cuda.pathfinder ```
cpcloud
left a comment
There was a problem hiding this comment.
The path hacking needs some more justification and explicit examples of in what common scenarios a plain import isn't going to work.
| for p in sys.path: | ||
| sp_cuda = os.path.join(p, "cuda") | ||
| if os.path.isdir(os.path.join(sp_cuda, "pathfinder")): | ||
| cuda.__path__ = list(cuda.__path__) + [sp_cuda] | ||
| break |
There was a problem hiding this comment.
Path hacking of any sort like this is a code smell. At the very least the docstring is missing a concrete example of why this is needed.
isolated build environments
is doing a ton of work there. Isolated how? What is backend-path and where does that come from? Where are all these nouns like backend-path and build-env coming from?
In what real-world scenarios does this actually happen where there's not a viable alternative?
I'm really skeptical that path hacking like this is necessary.
Made-with: Cursor
cpcloud
left a comment
There was a problem hiding this comment.
Main concern: this workaround solves one specific PEP 517 shadowing scenario by mutating global namespace-package state (cuda.__path__). That is risky in a build backend because the rest of the process now sees a different import model depending on whether this helper has already run.
The two concrete issues I see are:
except ModuleNotFoundErroris broad enough to catch real import failures insidecuda.pathfinder, so a broken dependency/import in that package can be misdiagnosed as namespace shadowing.- Replacing importlib's
_NamespacePathwith a plain list and appending the firstcuda/pathfinderhit fromsys.pathmakes the selected package "sticky" for the rest of the process and depends on path ordering heuristics.
I'm commenting on the cuda_bindings copy only to avoid duplication, but the same concerns apply to the mirrored helper in cuda_core.
cuda_bindings/build_hooks.py
Outdated
| """ | ||
| try: | ||
| import cuda.pathfinder | ||
| except ModuleNotFoundError: |
There was a problem hiding this comment.
This except ModuleNotFoundError is too broad: it will also catch ModuleNotFoundErrors raised from inside cuda.pathfinder itself. If cuda.pathfinder is installed but one of its own imports is broken, we silently reinterpret that as the namespace-shadowing case and take the workaround path.
I'd narrow the fallback to the exact failure we expect here:
try:
from cuda.pathfinder import get_cuda_path_or_home
return get_cuda_path_or_home
except ModuleNotFoundError as exc:
if exc.name != "cuda.pathfinder":
raiseThat preserves the real traceback for genuine cuda.pathfinder import bugs and keeps the workaround limited to the cuda.pathfinder-not-found case.
There was a problem hiding this comment.
Thanks for catching this (no pun intended)!
Done: commit b779d0d
| for p in sys.path: | ||
| sp_cuda = os.path.join(p, "cuda") | ||
| if os.path.isdir(os.path.join(sp_cuda, "pathfinder")): | ||
| cuda.__path__ = list(cuda.__path__) + [sp_cuda] |
There was a problem hiding this comment.
Mutating cuda.__path__ here permanently changes namespace resolution for the rest of the backend process, and it selects whichever cuda/pathfinder directory happens to appear first on sys.path. That's pretty fragile.
If we keep this overall approach, I think the safer version is to resolve the installed cuda-pathfinder distribution explicitly and append only that exact cuda/ directory, instead of scanning sys.path:
from importlib import metadata
from pathlib import Path
# after narrowing the ModuleNotFoundError as above
import cuda
dist = metadata.distribution("cuda-pathfinder")
site_cuda = str(dist.locate_file(Path("cuda")))
cuda_paths = list(cuda.__path__)
if site_cuda not in cuda_paths:
cuda.__path__ = cuda_paths + [site_cuda]
from cuda.pathfinder import get_cuda_path_or_home
return get_cuda_path_or_homeThat still uses the namespace workaround, but it removes the sys.path guesswork.
If you'd rather avoid cuda.__path__ mutation entirely, another option is to load cuda/pathfinder/_utils/env_vars.py directly from the installed distribution under a private module name and call get_cuda_path_or_home() from there. That file is stdlib-only today, so it avoids rewriting namespace-package state.
There was a problem hiding this comment.
I think the safer version is to resolve the installed cuda-pathfinder distribution explicitly
Yes, much better, thanks!
Done: commit 6facaa5
another option is to load cuda/pathfinder/_utils/env_vars.py directly
The answer to this goes deeper, I rewrote the PR description to make this clearer. Quoting from there:
The main point of this PR is to firmly establish cuda-pathfinder as a tool usable at build time, not just at runtime.
We don't want to advertise a helper function that reaches into pathfinder internals.
There was a problem hiding this comment.
Unfortunately it looks like we have to backtrack to the sys.path scan. Here is a Cursor writeup, based on information generated via commit 3cb0313:
We tried replacing the sys.path scan with importlib.metadata per your suggestion. It works locally but fails in CI (cibuildwheel / manylinux). Here's why.
Diagnostic output (linux-64, py3.12)
dist._path: /project/cuda_bindings/cuda_bindings.egg-info
dist._path.parent: /project/cuda_bindings
locate_file('cuda'): /project/cuda_bindings/cuda
locate_file exists: True
locate_file/pathfinder exists: False
cuda.__path__ (before): _NamespacePath(['/project/cuda_bindings/cuda'])
sys.path:
/tmp/build-env-_bna8wjb/lib/python3.12/site-packages -> cuda/pathfinder exists: True
cuda.__path__ (after): _NamespacePath(['/project/cuda_bindings/cuda'])
What happens
-
distribution("cuda-pathfinder")finds the project's owncuda_bindings.egg-info(created by the setuptools metadata step), not thecuda-pathfinderpackage installed in the build env under/tmp/build-env-…/lib/python3.12/site-packages/. -
So
dist.locate_file(Path("cuda"))returns/project/cuda_bindings/cuda— the shadowed directory itself. Adding it tocuda.__path__is a no-op. -
Separately,
cuda.__path__remained a_NamespacePathafter our plain-list assignment. The_NamespacePathdescriptor recalculates on access, discarding the mutation. (The old working code assignedcuda.__path__ = list(cuda.__path__) + [sp_cuda]without a precedingimport cudathat could re-trigger recalculation.)
Conclusion
importlib.metadata is unreliable here because the in-tree egg-info shadows the installed distribution. The sys.path scan is the correct approach — it directly verifies that cuda/pathfinder/ exists on disk and is not fooled by metadata from the wrong package.
CI run with diagnostics: https://github.com/NVIDIA/cuda-python/actions/runs/23809723643
There was a problem hiding this comment.
Another Cursor writeup, based on the diagnostic results:
Regarding the two concerns about the sys.path scan:
"First hit wins" ordering: In this context there is exactly one cuda-pathfinder installed in the build environment — pip deduplicates. So there is exactly one sys.path entry with cuda/pathfinder/ on disk. "First hit" finds the only hit.
Sticky cuda.__path__ mutation: The build backend is an ephemeral subprocess that builds one wheel and exits. The mutation corrects an already-wrong _NamespacePath (which only points to the project's cuda/ directory) to what it should have been (both the project's and site-packages' cuda/ directories). "Permanent" means the remaining ~seconds of that subprocess's lifetime.
And as shown in the diagnostic run, importlib.metadata cannot be used as an alternative because the in-tree cuda_bindings.egg-info shadows the installed cuda-pathfinder distribution — distribution("cuda-pathfinder") resolves to the project's own metadata, not the build-env's site-packages copy.
Made-with: Cursor
…anning sys.path Made-with: Cursor
|
Converting back to draft, until I figure out why all CI builds failed. |
|
/ok to test |
…17 builds" This reverts commit 3cb0313.
|
/ok to test |
|
@cpcloud this is ready for review again. |
Follow-up to #1801. Closes the build-time integration gap identified in #1803.
The main point of this PR is to firmly establish
cuda-pathfinderas a tool usable at build time, not just at runtime.The build hooks in
cuda_bindingsandcuda_corenow usecuda.pathfinder.get_cuda_path_or_home()to resolveCUDA_PATH/CUDA_HOME, replacing the previous inlineos.environ.get()fallback.Unfortunately a helper function (
_import_get_cuda_path_or_home()) is needed because PEP 517 isolated build environments shadow thecudanamespace package, preventing normalimport cuda.pathfinder. Issue #1824 serves as a living document for advertising the helper function. The use incuda_bindingsandcuda_coreis essentially a unit test to ensure the helper function works as intended. Issue #1824 will be updated as needed. It is also the place where users can report problems.Technical details
cuda-pathfinder>=1.5in bothbuild-system.requiresandproject.dependenciesforcuda_bindingsandcuda_core._import_get_cuda_path_or_home()helper detects namespace shadowing and usesimportlib.metadatato precisely locate the installedcuda-pathfinder, then fixescuda.__path__so the import succeeds.cuda_core/tests/test_build_hooks.pyaddsget_cuda_path_or_home.cache_clear()calls so monkeypatched env vars take effect.cuda_pathfinder/pixi.tomlpackage version from1.3.4a0to1.5.0to avoid confusion.